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Abstract — With the emergence of Web 2.0 applications that 
bestow rich user experience and convenience without time and 
geographical restrictions, web usage logs became a goldmine to 
researchers across the globe. User behavior analysis in different 
domains based on web logs has its utility for enterprises to have 
strategic decision making. Business growth of enterprises 
depends on customer-centric approaches that need to know the 
knowledge of customer behavior to succeed. The rationale behind 
this is that customers have alternatives and there is intense 
competition. Therefore business community needs business 
intelligence to have expert decisions besides focusing customer 
relationship management. Many researchers contributed towards 
this end. However, the need for a comprehensive framework that 
caters to the needs of businesses to ascertain real needs of web 
users. This paper presents a framework named extensible Web 
Usage Mining Framework (XWUMF) for discovering actionable 
knowledge from web log data. The framework employs a hybrid 
approach that exploits fuzzy clustering methods and methods for 
user behavior analysis. Moreover the framework is extensible as 
it can accommodate new algorithms for fuzzy clustering and user 
behavior analysis. We proposed an algorithm known as 
Sequential Web Usage Miner (SWUM) for efficient mining of 
web usage patterns from different datasets. We built a prototype 
application to validate our framework. Our empirical results 
revealed that the framework helps in discovering actionable 
knowledge. 

Keywords- Knowledge discovery; Web usage mining , Fuzzy 
clustering , Business intelligence. 

I. Introduction 

As enterprises in the real world need to know the web usage 
patterns of their customers, this research is useful to ascertain 
customer behavior and make strategies to improve customer 
satisfaction. The web usage mining for user behaviour analysis 
has many real world utilities. Building a framework that can 
cater to the needs of enterprises for user behaviour analysis is a 
challenging task. However, it is very useful to business 
community to make expert decisions. The user behaviour 
analysis needs different kinds of algorithms. In the proposed 
framework placeholders are provided for accommodating any 
kind of usage mining algorithms and fuzzy logic with 
combined processing. The fuzzy clustering can provide soft 
clusters that can be subjected to web usage mining for finding 


useful patterns. The patterns when interpreted by domain 
experts can result in business intelligence. Thus the proposed 
research has impact on the business community and consumer 
base as well. We used Wycombe District Council (WDC) 
dataset collected from Internet sources [1]. An excerpt of 
dataset is shown in Figure 1. 
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Figure 1 . An excerpt from WDC dataset. 


Different techniques came into existence in order to have 
user behavior analysis. However, we felt that a framework that 
can provide extensible features to have user behavior analysis 
is needed. In this paper we proposed and implemented a 
framework which is generic but provides placeholders for 
various future technologies. The framework is extensible and 
even supports personalized settings for user behavior analysis. 
The remainder of the paper is structured as follows. Section 2 
provides review of literature. Section 3 presents the proposed 
framework. The section 4 presents experimental results while 
section 5 concludes the paper besides giving directions for 
future work. 

II. Related Work 

Vythoulkas and Koutsopolous [16] employed neural 
networks, approximate reasoning, and fuzzy set theory for 
modelling choice behavior. They assumed that simple rules are 
used by decision makers. Rules and rule weights are used in the 
process of behavior analysis. Botha and Solms [17] used trend 
analysis and fuzzy logic for modeling behavior of intruders. 
Their approach is proactive based on the combination of trend 

92 


Volume: 02, Issue: 03, March 201 7 
ISBN: 978-0-9957075-4-2 


www.iieacs.com 
DOI: 1 0. 24032/ijeacs/0203/02 


N. Pushpalatha et al. 


(IJEACS) International Journal of Engineering and Applied Computer Science 


analysis and fuzzy logic. Xia, Ho and Capretz [18] Neuro- 
Fuzzy logic for analyzing software usage trends in the industry. 
Anderson et al. [19] focused on the analysis of user behavior 
with respect to fall in human life. Fuzzy logic and voxel person 
are used to detect fall behavior of humans using pre-recorded 
videos. Wang et al. [20] employed neural networks and fuzzy 
logic for behavior of attackers with respect to intrusions. They 
used KDD CUP 1999 dataset for analyzing behavior of 
intruders. Mitrovic et al. [21] explored behavior of bloggers 
with quantitative analysis. They focused on the behavior that is 
reflected by emotions. They combined machine learning and 
statistical physics to analyze emotional behavior of web users. 

Adadeh, Mohamadi, and Habibi [22] used genetic fuzzy 
systems for analyzing malicious users. They employed iterative 
rule learning to have knowhow on the user behavior. A good 
survey of fuzzy web mining can be found in [23] where 
techniques pertaining to fuzzy web structure mining, fuzzy web 
content mining and fuzzy web usage mining. Velesquez [24] 
combined both web usage mining and eye-tracking 
technologies for classifying web site key objects. This has 
provided more effective means of mining pertaining to web 
usage. He [25] focused on case based reasoning (CBR) and text 
mining for understanding user experience and improves it. 
They observed that text mining and Web 2.0 usage can bring 
about more useful information towards user behavior. Cruz- 
Benito et al. [26] explored educational virtual world for user 
behavior. They discovered usage behavior of users in the 
education domain in the learning process. Conti et al. [27] 
studied user behavior pertaining Android application usage. 
They focused on user actions and the trends in the user 
behavior in using Android applications. Vu et al. [28] focused 
on travel behavior of tourists. They used the notion of 
geotagging photos for user behavior analysis. Abello et al. [29] 
made a survey of semantic web technologies used for Online 
Analytical Processing (OLAP) which can be used for user 
behavior analysis. 

The different approaches employed in the literature are 
good for specific purposes. However, we found that there is a 
need for a comprehensive framework with flexible and 
extensible technologies that can cope with future technologies 
as well. In this paper we proposed and implemented a 
framework which is generic but provides placeholders for 
various future technologies. The framework is extensible and 
even supports personalized settings for user behavior analysis. 

III. Proposed Framework 

We proposed a framework which is generic in nature and 
accommodates future technologies in order to have better 
performance in user behavior analysis which helps in finding 
knowhow on web usage. The framework is named as 
extensible Web Usage Mining Framework (XWUMF). The 
framework provides reusable components or building blocks 
that can be used along with customized logic. The framework 
supports a hybrid approach which can have fuzzy clustering 
techniques and web mining techniques working together to 
have effective user behavior analysis mechanism. Memory 
usage and time taken are the two performance evaluation 
parameters it supports for every operation in the framework. 


The framework accommodates new pre-processing techniques, 
fuzzy clustering techniques and web mining techniques so as to 
make the proposed framework flexible and extensible. Before 
presenting our framework, the overview of general web usage 
mining is shown in Figure 2. 



Figure 2. General Web Usage Mining Overview. 


Web log data collected from web servers is subjected to 
pre-processing and then user behavior analysis in order to 
obtain business intelligence. In this approach the web log data 
comes from different sources. The data is pre-processed to 
improve the data by handing missing values. Then the data is 
subjected to usage mining, pattern discovery and pattern 
analysis. Finally the method results in business intelligence. 
The BI is in the form of well- structured patterns that have been 
interpreted by domain expert. 



As shown in Figure 2, the web log files collected from 
various sources are subjected to pre-processing. It is similar to 
that of first method. After pre-processing, the framework 
supports a hybrid approach which combines fuzzy clustering 
and user behavior analysis. The framework has placeholders 
for pre-processing techniques, fuzzy clustering techniques, user 
behavior analysis and optimization. The framework is flexible 
and helps developers to build new techniques by referring to 
existing ones. The framework reflects that fact that future 
enhancements to the proposed framework are accommodated. 
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The flexible and generic framework can be realized with web 
mining as solution for extracting business intelligence from 
various sources of data. The measures considered for 
evaluating the proposed work are memory usage an time taken 
to perform various operations. Our empirical study in this paper 
is limited to web mining technique only. The full 
implementation details are presented in our next research 
paper. The algorithm we implemented as part of the framework 
is shown here. 


Algorithm: Sequential Web Usage Miner 

Inputs : Sequence Web Log Data, mintime , minconf 

Output : User Behavior in terms of Usage Patterns 

01 Initialize usage time vector UT 

02 Initialize web log data vector WLD 

03 Initialize rules vector R 

04 Initialize final rules vector R ’ 

05 Load sequence web log data into to WLD 
Compute Usage Time 

06 For each row in WLD 

07 Compute usage time ut 

08 Add ut to UT 

09 Associate UT with WLD 

10 End For 

Prune Search Space 

1 1 For each row in WLD 

12 IF ut >- mintime THEN 

13 Remove row from WLD 

14 END IF 

15 End For 

Compute Rules from WLD 

16 Compute rules into R 
Validate Rules 

17 For each r in R 

1 8 Compute conf WLD (r) 

19 IF confwL D (r) >= minconf THEN 

20 AddrtoJT 

21 END IF 

22 End For 

23 Output R’ 

Algorithm 1. Sequential web usage miner 

The algorithm is named Sequential Web Usage Miner 
(SWUM) which takes sequence web log data, minimum time, 
and minimum confidence as inputs. It generates patterns that 
reflect user behavior. First of all usage time of the web pages is 
computed based on the data provided in the dataset. For 
instance usage time of users for different web pages is shown 
in Fig. 1 . The usage time is considered to filter out processing 
by using MinTime parameter. The minimum confidence 
provides further statistical measure to have quality patterns. 
The search space is pruned using MinTime and minimum 
confidence parameters. This process can improve the 
performance of algorithm for user behavior analysis. Then 
rules are computed from the web log data. The rules reflect the 
trends in web usage. Then the rules are validated to have final 
results in the form of web usage trends. 


IV. Experimental Results 

We made experiments in a PC with i7 processor running 
Windows 10 operating system. We built a prototype 
application using Java platform. Java IO package is used to 
work with dynamics of datasets. Java Swing Application 
Programming Interface (API) is used for managing Graphical 
User Interface (GUI). We used Java Collections API for storing 
data and performing web usage mining. We used the proposed 
algorithm for mining user behavior in terms of usage patterns. 
Four datasets are used for experiments. The first dataset is 
known as Wycombe District Council (WDC) which is 
collected from [1]. The other three datasets are synthesized 
datasets to have better evaluation of the proposed algorithm. 
The important observations are the web usage patterns obtained 
from the dataset besides the performance measures like 
execution time and memory usage. Execution time is the 
measure used to know how fast the proposed algorithm is 
working while the memory usage is another performance 
measure used to know how much main memory is needed to 
process the data for user behavior analysis. 

A. Results with MinTime between 5000 and 20000 

We made experiments with all the four datasets using 
minimum time usages between 5000 and 20000 seconds. In 
fact minimum time is the support kind of statistical measure in 
data mining used to obtain quality results. Table 1 shows the 
execution times for different datasets observed while 
performing user behavior analysis. 


Min 

Execution Time (Sec) 

Time 

WDC 

Dataset 2 

Dataset 3 

Dataset 4 

5000 

0.220 

2.828 

6.409 

14.084 

10000 

0.193 

2.351 

5.043 

12.295 

15000 

0.196 

2.508 

5.354 

13.685 

20000 

0.223 

3.520 

6.241 

13.098 


Table 1. Execution time (s) for different datasets 


The execution time is more when MinTime parameter is 
increased. The rational behind this is the time taken for filtering 
out tuples in the dataset. There is gradual increase in time taken 
when MinTime parameter value is increased. . In case of the 
fourth dataset the execution time is 14.08 seconds and 12.29 
seconds for MinTime 5000, 10000 similarly the execution time 
is 13.68 seconds and 13.09 seconds for Dataset four when 
MinTime is 15000 and 20000. The visualization of the results 
is shown in Figure 3. 



5000 10000 15000 20000 


mintime 


Figure 4. Shows execution time for different datasets when MinTime 
parameter range is between 5000 and 20000. 
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The MinTime parameter has its influence on the time taken 
for user behavior analysis. At the same time there is difference 
in time taken for different datasets due to the size of datasets. 

B. Results with MinTime between 25000 and 40000 


Min 

Time 

Execution Time (Sec) 

WDC 

Dataset 2 

Dataset 3 

Dataset 4 

25000 

0.218 

2.724 

5.562 

13.609 

30000 

0.312 

3.126 

5.425 

13.882 

35000 

0.199 

3.113 

7.428 

12.191 

40000 

0.143 

2.559 

5.781 

14.513 


Table 2. Execution time for different datasets when MinTime parameter range 
is between 25000 and 40000 

This experiment is made with different MinTime range that 
is between 25000 and 40000. WDC shows gradual increase in 
time taken when MinTime increases. In case of the fourth 
dataset the execution time is 13.6 and 13.8 seconds for 
MinTime 25000, 30000 similarly the execution time is 12.2 
and 14.5 seconds for Dataset four when MinTime is 35000 and 
40000. 



Figure 5. Execution times taken for user behavior analysis for different 
datasets with MinTime parameter range between 25000 and 40000. 

As shown in Fig. 4, it is evident that there is increasing 
trend with all datasets in execution time with some exceptions. 
With MinTime 20000, the datasets showed increasing 
execution time from WDC to Dataset 4. Here the WDC also 
shows increasing execution time as MinTime increases with 
exception for Mintime 14000. 


MinTime 

Memory (MB) 

WDC 

Dataset 2 

Dataset 3 

Dataset 4 

5000 

5.947838 

119.8514 

184.6231 

335.3677 

10000 

5.98597 

121.2761 

177.7986 

320.0214 

15000 

5.928062 

107.6817 

183.1708 

330.2894 

20000 

5.928093 

116.6732 

182.8204 

303.5607 


Table 3. Memory Usage performance of user behavior analysis for different 
datasets with MinTime parameter range between 5000 and 20000. 


As shown in Table 3, the proposed algorithm needed 5.94 
MB of main memory for user behavior analysis using WDC 
dataset. The three synthetic datasets show almost similar 
memory usage with slight difference. The maximum memory is 
consumed by the algorithm to process Dataset 4 while 
minimum memory is consumed by the algorithm for WDC 



dataset. 


Figure 6. Memory Usage taken for user behavior analysis for different 
datasets with MinTime parameter range between 5000 and 20000. 

As shown in Fig. 5, the memory usage dynamics are 
presented. WDC needed less main memory when compared 
with other datasets. The proposed algorithm makes use of main 
memory for user behavior analysis. 


MinTime 

Memory (MB) 

WDC 

Dataset 2 

Dataset 3 

Dataset 4 

25000 

5.920708 

109.8285 

190.4586 

329.9598 

30000 

5.90757 

116.37 

186.1025 

320.612 

35000 

5.923828 

120.343 

181.3618 

306.5138 

40000 

5.9076 

117.943 

182.3883 

334.3728 


Table 4. Memory Usage performance of user behavior analysis for different 
datasets with MinTime parameter range between 25000 and 30000 

As shown in Table 4, the proposed algorithm needed 5.92 
MB of main memory for user behavior analysis using WDC 
dataset. The three synthetic datasets show almost similar 
memory usage with slight difference. The maximum memory is 
consumed by the algorithm to process Dataset 4 while 
minimum memory is consumed by the algorithm for WDC 
dataset. 
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Figure 7. Memory Usage taken of user behavior analysis for different 
datasets with MinTime parameter range between 25000 and 30000. 

As shown in Fig. 6, the memory usage dynamics are 
presented. WDC needed less main memory when compared 
with other datasets. The proposed algorithm makes use of main 
memory for user behavior analysis. 

V. Conclusions And Future Work 

In this paper we studied the problem of web usage mining. 
We proposed a framework named extensible Web Usage 
Mining Framework (XWUMF). The framework supports 
hybrid approach for processing web log data. Web log data 
provides usage behavior of customers. User behavior analysis 
is made using the combination of web mining and fuzzy logic. 
The proposed framework is flexible and extensible so as to 
support different combination of techniques in future. User 
behavior analysis can be made using the proposed algorithm in 
different domains. The framework is not tied with any kind of 
domain. Customer-centric approach can be leveraged by 
enterprises by using the framework for user behavior analysis. 
The business intelligence in this regard is essential as there is 
intense competition among business in the real world. In this 
paper we implemented web usage mining algorithm named 
Sequential Web Usage Miner (SWUM) for efficient mining of 
web usage patterns from different datasets. We used four 
datasets for validating efficiency of the proposed algorithm. 
We built a prototype application to validate our framework. 
Our empirical results revealed that the framework helps in 
discovering actionable knowledge. We implement the rest of 
the framework in our future work for having accurate results in 
web usage mining. 
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